Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review

Authors: De Jong Yeong, Gustavo Velasco-Hernandez, John Barry, Joseph Walsh

Presentation by: Aleksandar Avdalovic

Time of Presentation: 4th of February 2025

Blog post by: Bassel Succar

Link to Paper:

https://www.mdpi.com/1424-8220/21/6/2140

Summary of the Paper

The paper "Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review" provides a comprehensive overview of the role of sensors in autonomous vehicles (AVs), emphasizing their importance in perception, localization, and decision-making. It examines key sensor technologies such as cameras, LiDAR, and radar, discussing their strengths, limitations, and performance under various environmental conditions. The paper highlights the necessity of sensor calibration as a prerequisite for accurate data fusion and object detection, reviewing available open-source calibration tools. Additionally, it categorizes sensor fusion approaches into high-level, mid-level, and low-level fusion, evaluating state-of-the-art algorithms that enhance object detection and overall driving safety. The review concludes by addressing challenges in sensor fusion, such as data synchronization and environmental adaptability, while proposing future research directions for improving autonomous vehicle technology.

Slide 3: Introduction to Autonomous Vehicles

Introduction to Autonomous Vehicles Image

The slide titled "Introduction to Autonomous Vehicles" provides an overview of key aspects of autonomous driving systems. It highlights concerns related to road safety, the need for automation, and market growth. Examples of current autonomous vehicle implementations include Waymo's SAE Level 4 taxi service in Arizona and Tesla & Audi's SAE Level 2 systems. The slide explains the SAE J3016 automation levels, ranging from Level 0 (No Automation) to Level 5 (Full Automation). Finally, it emphasizes that autonomous vehicles must be capable of handling unpredictable environments, harsh weather conditions, and complex traffic interactions.

Slide 4: AV System Architecture

AV System Architecture Image

The slide "AV System Architecture" explains the structure of autonomous vehicle systems from both technical and functional perspectives. The technical perspective consists of a hardware layer that includes sensors, processing units, communication modules, and actuators, while the software layer integrates machine learning, artificial intelligence, data collection, real-time control, and user interface/user experience. The functional perspective covers key processes such as perception, which involves sensing and interpreting the environment, planning and decision-making based on sensor data, motion and vehicle control for movement and navigation, and supervision to ensure safe and efficient operation. The diagram on the right visually represents the interaction between hardware, software, and functional components in autonomous vehicle systems.

Slide 5: Sensor Technology and Fusion in AV

Sensor Technology and Fusion in AV Image

The slide "Sensor Technology and Fusion in AV" emphasizes the role of sensors in enabling autonomous vehicle (AV) perception, safety, and efficiency. It differentiates between smart sensors (e.g., cameras, LiDAR, and radar) that process data onboard, and non-smart sensors that generate raw data requiring external processing. Multi-sensor fusion is highlighted as a key technique to compensate for the limitations of individual sensors, supporting applications like localization, mapping, and object detection. The slide also mentions advanced fusion methods using deep learning-based sensor fusion techniques. An intuitive example illustrates how a self-driving car uses data from a camera, radar, and LiDAR to identify a pedestrian in the rain and slow down accordingly.

Slide 6: Sensor Technology in AV

Sensor Technology in AV Image

The slide "Sensor Technology in AV" explains how sensors detect environmental changes and convert them into measurable data. It differentiates between passive sensors, which use energy from the environment (such as cameras capturing light), and active sensors, which emit energy and measure responses (such as LiDAR and radar). The slide also categorizes sensors based on their function, distinguishing proprioceptive sensors, which measure the vehicle’s internal state like force and angular velocity, from exteroceptive sensors, which gather data about the vehicle’s external environment. The accompanying diagram illustrates the various sensor types and their applications in autonomous vehicles.

Slide 7: Cameras in AVs

Cameras in AVs Image

The slide "Cameras in AVs" discusses the advantages, types, and challenges of using cameras in autonomous vehicles. Cameras are relatively low-cost compared to other sensors, provide high-resolution images, and can detect both static and moving obstacles, including road signs and traffic lights. The three main types of cameras used are monocular cameras, which lack depth perception, binocular (stereo) cameras, which use two cameras for depth perception, and fisheye cameras, which provide a wide-angle view for 360-degree coverage, making them useful for parking and traffic assistance. However, cameras face challenges such as optical distortion, sensitivity to weather conditions, and high computational load for processing image data.

Slide 8: LiDAR in AVs

LiDAR in Autonomous Vehicles Image

The slide "LiDAR in Autonomous Vehicles" explains LiDAR as a remote sensing technology that emits laser pulses and measures reflection time to estimate distance. It creates a 3D point cloud of the surroundings, enabling precise perception for autonomous vehicles. The slide categorizes LiDAR into three types: 1D LiDAR, which measures only distance (x-coordinates), 2D LiDAR, which adds angular measurement (y-coordinates), and 3D LiDAR, which captures full spatial depth including x, y, and z coordinates. It also differentiates between mechanical LiDAR, which uses rotary lenses for 360-degree scanning and is widely used in AV research, and solid-state LiDAR, which employs micro-structured waveguides for improved robustness and lower cost but typically has a limited field of view (FoV) of less than 120 degrees.

Slide 9: Radar in AVs

Radar in Autonomous Vehicles Image

The slide "Radar in AVs" explains that radar, or Radio Detection and Ranging, uses electromagnetic waves to detect objects. It measures distance, speed, and relative motion using the Doppler effect and operates at 24 GHz, 60 GHz, 77 GHz, and 79 GHz frequencies. The Doppler shift is used to determine motion by measuring changes in wave frequency. When an object moves toward the radar, the frequency increases (resulting in shorter waves), while moving away causes the frequency to decrease (resulting in longer waves). A formula is provided to calculate the Doppler frequency shift based on factors such as the relative speed of the target, signal frequency, speed of light, and wavelength of emitted energy.

Slide 10: Radar in AVs

Radar in Autonomous Vehicles Image

The slide discusses the capabilities and limitations of radar technology in autonomous vehicles. Radar can effectively detect metal objects such as road signs and guardrails but struggles with identifying object shapes and differentiating between obstacles and the road. The slide also highlights different types of radar systems used in AVs. Short-Range Radar (SRR) is used for parking assistance and collision warning, Mid-Range Radar (MRR) is utilized for side and rear collision detection and blind-spot monitoring, while Long-Range Radar (LRR) supports adaptive cruise control and highway obstacle detection. An image visualization on the right demonstrates false-positive detections in radar-based object recognition.

Slide 11: Sensor Calibration & Fusion

Sensor Calibration & Fusion Image

The slide "Sensor Calibration & Fusion" highlights the importance of sensor calibration and fusion in autonomous vehicles. Sensor calibration ensures accurate positioning and orientation by addressing intrinsic distortions, aligning multiple sensors in a shared frame (extrinsic calibration), and synchronizing sensor timing (temporal calibration). Sensor fusion combines data from multiple sensors to improve object detection reliability. The Multi-Sensor Data Fusion (MSDF) framework is used to align sensor data using rotation and translation matrices while integrating object detection outputs from different sensors. The accompanying diagram illustrates the sensor alignment process and how data fusion supports tracking and decision-making in AVs.

Slide 12: Intrinsic Camera Calibration in AVs

Intrinsic Camera Calibration in AVs Image

The slide "Intrinsic Camera Calibration in AVs" explains the process of intrinsic calibration, which corrects lens distortions and estimates camera-specific parameters. The pinhole camera model is commonly used for projecting 3D points onto a 2D image plane. Key intrinsic parameters include focal length (fx, fy), which determines image scaling, principal point (cx, cy), which defines the optical center, and skew (s), which accounts for non-orthogonal axes. The diagram illustrates how light rays pass through a pinhole camera to form an image.

Slide 13: Camera Projection Matrix

Camera Projection Matrix Image

The slide "Camera Projection Matrix" explains the mathematical model used to convert 3D world coordinates (Xw, Yw, Zw) into 2D image coordinates (x, y). The camera projection matrix (P) is a 4×3 matrix composed of the intrinsic matrix (K), which defines camera-specific parameters, and the extrinsic matrix [R|t], which represents rotation and translation transformations. The extrinsic transformation maps 3D world points to camera coordinates, while the intrinsic matrix projects them onto the image plane. The slide also mentions Zhang’s calibration method, a widely used checkerboard-based approach for camera calibration.

Slide 14: Extrinsic Calibration Overview

Extrinsic Calibration Overview Image

The slide "Extrinsic Calibration Overview" explains how multiple sensors are aligned in a shared 3D coordinate system by estimating position and orientation (6 Degrees of Freedom - DoF) relative to an external reference frame. This process outputs a 3×4 transformation matrix that includes rotation (R) and translation (t). Challenges in extrinsic calibration include matching different sensor data, such as aligning camera images with LiDAR point clouds, and ensuring precise alignment of multi-sensor measurements. Calibration methods include target-based approaches, which use checkerboards, circular markers, or reflectors for accurate alignment, and targetless methods, which estimate motion from sensor data but are sensitive to environmental conditions.

Slide 15: Joint Extrinsic Calibration

Joint Extrinsic Calibration Image

The slide "Joint Extrinsic Calibration" explains the process of aligning radar, LiDAR, and cameras into a common reference frame. The calibration target consists of four circular holes for camera and LiDAR detection, along with a metallic trihedral corner reflector to enhance radar detection. Various calibration methods are discussed, including Pose & Structure Estimation (PSE), which estimates board locations, Minimally Connected Pose Estimation (MCPE), which uses a reference sensor, and Fully Connected Pose Estimation (FCPE), which optimizes transformations between all sensors. The accompanying image illustrates the proposed calibration target design used for multi-sensor alignment.

Slide 16: Temporal Calibration

Temporal Calibration Image

The slide "Temporal Calibration" discusses the importance of time synchronization between multi-sensor data streams operating at different frequencies. Sensors have different latencies, such as cameras capturing at 30 FPS while LiDAR operates at 5 Hz, and communication and processing delays can introduce time misalignment. Calibration approaches include external synchronization, which relies on GPS or hardware clocks, and internal synchronization, which uses timestamps within sensor data. Two methods are highlighted: Approximate Time Synchronizer (ROS), which matches messages based on timestamps, and Spatial-Temporal Calibration, which estimates both sensor positions and time delays using Gaussian Processes (GPs).

Slide 17: Sensor Fusion

Sensor Fusion Techniques & Algorithms Image

The slide "Sensor Fusion" highlights different fusion techniques combining cameras, LiDAR, and radar to enhance autonomous vehicle perception. Camera-Radar (CR) fusion provides high-resolution images along with obstacle velocity detection and is used by Tesla. Camera-LiDAR (CL) fusion enhances depth perception and object recognition. Camera-LiDAR-Radar (CLR) fusion combines the strengths of all three sensors, offering precise object detection and distance measurement, and is used by companies like Waymo and Navya. The accompanying table compares the capabilities of individual sensors versus fusion, showing that sensor fusion significantly improves performance across factors such as distance accuracy, object detection, and weather adaptability.

Slide 18: Sensor Fusion

Sensor Fusion Techniques & Algorithms Image

The slide categorizes fusion methods into High-Level Fusion (HLF), Low-Level Fusion (LLF), and Mid-Level Fusion (MLF). HLF processes each sensor’s data independently and then fuses the outputs using methods like non-linear Kalman filtering, making it simple to implement but potentially discarding low-confidence detections. LLF fuses raw sensor data before detection, retaining all sensor information for higher accuracy, but requires precise extrinsic and temporal calibration. MLF combines extracted features, such as color from images and depth from LiDAR, offering a balance between raw data and decision-making, though it may lose contextual information necessary for Level 4/5 autonomy.

Slide 19: Sensor Fusion Techniques & Algorithms

Sensor Fusion Techniques & Algorithms Image

The slide "Sensor Fusion Techniques & Algorithms" compares classical and deep learning-based approaches for sensor fusion in autonomous vehicles. Classical algorithms use knowledge-based, statistical, and probabilistic methods to handle data uncertainty and noise. Deep learning approaches, including CNNs, RNNs, YOLO, SSD, ResNet, and CenterNet, process raw sensor data and extract features automatically. Frameworks like VoxelFusion and PointFusion combine image and point cloud data. While deep learning methods provide superior object detection and high detection speeds (e.g., YOLO at 45–65 FPS), they require large, high-quality datasets and are computationally intensive. The choice between classical and deep learning methods depends on real-time performance and accuracy requirements for the application.

Slide 20: Challenges in Sensor Fusion

Challenges in Sensor Fusion Image

The slide "Challenges in Sensor Fusion" highlights key issues in autonomous vehicle sensor integration. Data volume and computation pose challenges as AVs generate vast amounts of data, requiring significant processing power for real-time fusion. Calibration and synchronization are critical, necessitating precise extrinsic (spatial) and temporal calibration. Data quality and model robustness are concerns since poor sensor data leads to inaccurate perception, and deep learning models are susceptible to adversarial attacks and biases. Environmental factors, such as fog, rain, and snow, can degrade sensor performance, necessitating robust fallback strategies and human intervention options to maintain system reliability.

Slide 21: Conclusions

Conclusions Image

Key takeaways from the study on sensor fusion in autonomous vehicles:

Future directions for improving sensor fusion:

Slide 22: Discussion

Discussion Image

The discussion section raises important questions about the challenges and security concerns in autonomous vehicle sensor fusion:

These questions highlight the trade-offs in autonomous vehicle perception, balancing security, privacy, and sensor capabilities.

Discussion

Discussion

Question 1: Are smart sensors better than non-smart sensors?

Obiora and George discussed that smart sensors help distribute computing load, making them useful for efficient real-time processing. However, their effectiveness depends on the available resources. The professor added that the decision ultimately depends on the budget and system goals. While smart sensors offer advanced capabilities, they also introduce security risks and additional safety costs, as Aleksandar pointed out.

Question 2: What is the final output of radar?

The group agreed that radar measures speed, distance, and relative motion of objects. The professor further explained that radar data can be used to calculate round-trip time and the Doppler shift, which helps determine whether an object is moving toward or away from the vehicle.

Audience Questions

Q1: Is each sensor calibrated twice?

Yes, each sensor is first calibrated extrinsically (relative to a global reference frame), and then all sensors are jointly calibrated together to ensure alignment.

Q2: Can you explain sensor fusion simply?

The professor provided an example of a car driving in the snow. If one sensor is obstructed by snowfall, others like radar or LiDAR can compensate, allowing for more accurate perception and decision-making.

Q3: Does combining multiple sensors improve predictions?

The professor emphasized that multiple sensors enhance prediction accuracy. By fusing data from different sources, discrepancies caused by individual sensor failures can be minimized.

Security Considerations

Q4: How difficult is it to attack multiple sensors simultaneously?

Obiora and George noted that executing a multi-sensor attack is challenging. If one sensor is compromised, discrepancies with the remaining sensors would raise suspicion, making a full-system failure unlikely. The conclusion was that attacking all sensors at once is difficult.

Privacy & Alternative Sensor Usage

Q5: Can autonomous vehicles function without cameras?

Bassel and Ruslan pointed out that certain functionalities could operate effectively without cameras, relying on LiDAR and radar for perception.

Optimization Strategies

Q6: Should we develop new sensors or optimize existing ones?

Obiora and George suggested that rather than creating entirely new sensors, optimizing the performance of current sensors through advanced fusion techniques could lead to more reliable and cost-effective solutions.